Goto

Collaborating Authors

 Sololá


Depth and Autonomy: A Framework for Evaluating LLM Applications in Social Science Research

Sanaei, Ali, Rajabzadeh, Ali

arXiv.org Artificial Intelligence

Large language models (LLMs) are increasingly utilized by researchers across a wide range of domains, and qualitative social science is no exception; however, this adoption faces persistent challenges, including interpretive bias, low reliability, and weak auditability. We introduce a framework that situates LLM usage along two dimensions, interpretive depth and autonomy, thereby offering a straightforward way to classify LLM applications in qualitative research and to derive practical design recommendations. We present the state of the literature with respect to these two dimensions, based on all published social science papers available on Web of Science that use LLMs as a tool and not strictly as the subject of study. Rather than granting models expansive freedom, our approach encourages researchers to decompose tasks into manageable segments, much as they would when delegating work to capable undergraduate research assistants. By maintaining low levels of autonomy and selectively increasing interpretive depth only where warranted and under supervision, one can plausibly reap the benefits of LLMs while preserving transparency and reliability.


A Computational Method for Measuring "Open Codes" in Qualitative Analysis

Chen, John, Lotsos, Alexandros, Zhao, Lexie, Wang, Caiyi, Hullman, Jessica, Sherin, Bruce, Wilensky, Uri, Horn, Michael

arXiv.org Artificial Intelligence

Qualitative analysis is critical to understanding human datasets in many social science disciplines. Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets. Yet, meeting methodological expectations (such as "as exhaustive as possible") can be challenging. While many machine learning (ML)/generative AI (GAI) studies have attempted to support open coding, few have systematically measured or evaluated GAI outcomes, increasing potential bias risks. Building on Grounded Theory and Thematic Analysis theories, we present a computational method to measure and identify potential biases from "open codes" systematically. Instead of operationalizing human expert results as the "ground truth," our method is built upon a team-based approach between human and machine coders. We experiment with two HCI datasets to establish this method's reliability by 1) comparing it with human analysis, and 2) analyzing its output stability. We present evidence-based suggestions and example workflows for ML/GAI to support open coding.


Prompts Matter: Comparing ML/GAI Approaches for Generating Inductive Qualitative Coding Results

Chen, John, Lotsos, Alexandros, Zhao, Lexie, Wang, Grace, Wilensky, Uri, Sherin, Bruce, Horn, Michael

arXiv.org Artificial Intelligence

Inductive qualitative methods have been a mainstay of education research for decades, yet it takes much time and effort to conduct rigorously. Recent advances in artificial intelligence, particularly with generative AI (GAI), have led to initial success in generating inductive coding results. Like human coders, GAI tools rely on instructions to work, and how to instruct it may matter. To understand how ML/GAI approaches could contribute to qualitative coding processes, this study applied two known and two theory-informed novel approaches to an online community dataset and evaluated the resulting coding results. Our findings show significant discrepancies between ML/GAI approaches and demonstrate the advantage of our approaches, which introduce human coding processes into GAI prompts.


Generative AI Tools in Academic Research: Applications and Implications for Qualitative and Quantitative Research Methodologies

Perkins, Mike, Roe, Jasper

arXiv.org Artificial Intelligence

This study examines the impact of Generative Artificial Intelligence (GenAI) on academic research, focusing on its application to qualitative and quantitative data analysis. As GenAI tools evolve rapidly, they offer new possibilities for enhancing research productivity and democratising complex analytical processes. However, their integration into academic practice raises significant questions regarding research integrity and security, authorship, and the changing nature of scholarly work. Through an examination of current capabilities and potential future applications, this study provides insights into how researchers may utilise GenAI tools responsibly and ethically. We present case studies that demonstrate the application of GenAI in various research methodologies, discuss the challenges of replicability and consistency in AI-assisted research, and consider the ethical implications of increased AI integration in academia. This study explores both qualitative and quantitative applications of GenAI, highlighting tools for transcription, coding, thematic analysis, visual analytics, and statistical analysis. By addressing these issues, we aim to contribute to the ongoing discourse on the role of AI in shaping the future of academic research and provide guidance for researchers exploring the rapidly evolving landscape of AI-assisted research tools and research.


MERTech: Instrument Playing Technique Detection Using Self-Supervised Pretrained Model With Multi-Task Finetuning

Li, Dichucheng, Ma, Yinghao, Wei, Weixing, Kong, Qiuqiang, Wu, Yulun, Che, Mingjin, Xia, Fan, Benetos, Emmanouil, Li, Wei

arXiv.org Artificial Intelligence

Instrument playing techniques (IPTs) constitute a pivotal component of musical expression. However, the development of automatic IPT detection methods suffers from limited labeled data and inherent class imbalance issues. In this paper, we propose to apply a self-supervised learning model pre-trained on large-scale unlabeled music data and finetune it on IPT detection tasks. This approach addresses data scarcity and class imbalance challenges. Recognizing the significance of pitch in capturing the nuances of IPTs and the importance of onset in locating IPT events, we investigate multi-task finetuning with pitch and onset detection as auxiliary tasks. Additionally, we apply a post-processing approach for event-level prediction, where an IPT activation initiates an event only if the onset output confirms an onset in that frame. Our method outperforms prior approaches in both frame-level and event-level metrics across multiple IPT benchmark datasets. Further experiments demonstrate the efficacy of multi-task finetuning on each IPT class.